How to Overcome the Domain Barriers in Pattern-Based Machine Translation System

نویسندگان

Sung-Kwon Choi

Ki-Young Lee

Yoon-Hyung Roh

Oh-Woog Kwon

Young-Gil Kim

چکیده

One of difficult issues in pattern-based machine translation system is maybe to find how to overcome the domain difference in adapting a system from one domain to other domain. This paper describes how we have resolved such barriers among domains as default target word of any domain, domain-specific patterns, and domain adaptation of engine modules in pattern-based machine translation system, especially English-Korean pattern-based machine translation system. For this, we will discuss two types of customization methods which mean a method adapting an existing system to new domain. One is the pure customization method introduced for patent machine translation system in 2006 and another is the upgraded customization method applied to scientific paper machine translation system in 2007. By introducing an upgraded customization method, we could implement a practical machine translation system for scientific paper translation within 8 months, in comparison with the patent machine translation system that was completed even in 24 months by the pure customization method. The translation accuracy of scientific paper machine translation system also rose 77.25% to 81.10% in spite of short term of 8 months.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

Experience of Health Leadership in Partnering With University-Based Researchers in Canada – A Call to “Re-imagine” Research

Background Emerging evidence that meaningful relationships with knowledge users are a key predictor of research use has led to promotion of partnership approaches to health research. However, little is known about health system experiences of collaborations with university-based researchers, particularly with research partnerships in the area of health system design and health service org...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

How to Overcome the Domain Barriers in Pattern-Based Machine Translation System

نویسندگان

چکیده

منابع مشابه

A Hybrid Machine Translation System Based on a Monotone Decoder

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Experience of Health Leadership in Partnering With University-Based Researchers in Canada – A Call to “Re-imagine” Research

A new model for persian multi-part words edition based on statistical machine translation

عنوان ژورنال:

اشتراک گذاری